title: week 1 - Probability and Statistics for Business - I
dg-publish: true
tags:
  - lecture
  - "#set_theory"
  - "#basic_probability"
  - "#sem_1_Probability_and_Statistics_for_Business_part_1"
date: 2026-01-18

week 1

1. Statistic categorization
- 1.1 Descriptive statistics
- 1.2 Inferential statistics
2. Individuals, variables, and observations
- 2.1 Types of variables
  - 2.1.1 Types of Categorical (Qualitative) Variables
  - 2.1.2 Types of Numerical (Quantitative) Variables
- 2.2 The Four Scales of Measurement
3. Summary Statistics

1. Statistic categorization

categorized into two main parts

Descriptive statistics
Inferential statistics

1.1 Descriptive statistics

Gives an idea about the data we already have and gives a summary of all the features we can get from a given data. it consists,

data collection
organization
presentation
analysis

Examples:

getting average height of all the students in the class when we know all the heights

1.2 Inferential statistics

Get insights of a large population using a sample of that population, in this we examine a representative subset(sample) of the population

Probability measures how likely an event is to occur it's a way of quantifying uncertainty.

Example:

When we need to find the average height of the whole school, we will only measure the heights of a smaller group which represents the population and get insights from that.

Cat Image

2. Individuals, variables, and observations

Individuals: the entities being studied (people, objects, events, etc.)
Variable: Characteristic of individual (columns of a table)
Observations: the data collected from those individuals (rows of a table)

2.1 Types of variables

Two types

Categorical variable (Qualitative)
Numerical variable (Quantitative)

Feature	Categorical Variable (Qualitative)	Numerical Variable (Quantitative)
Definition	Represents groups or categories that cannot be measured numerically	Represents quantitative values that can be measured and calculated
Data Type	Text or labels (sometimes coded as numbers)	Numbers (integers or decimals)
Examples	Gender (Male/Female), Eye color (Blue/Brown), Blood type (A/B/AB/O)	Age (25, 30), Height (170 cm), Salary ($5000)

2.1.1 Types of Categorical (Qualitative) Variables

Type	Definition	Examples
Nominal	Categories with no specific order	Colors: Red, Blue, Green; Blood type: A, B, AB, O
Ordinal	Categories with a specific order, but uneven differences between them	Ratings: Low, Medium, High; Education level: High School, Bachelor, Master
Dichotomous / Binary	Only two categories	Yes/No; Male/Female; Pass/Fail

2.1.2 Types of Numerical (Quantitative) Variables

Type	Definition	Examples
Interval	Continuous scale with equal gaps, but no true zero	Temperature in Celsius or Fahrenheit
Ratio	Continuous scale with equal gaps and a true zero	Height, Weight, Age, Salary

A true zero means that the zero point on the scale represents the complete absence of the quantity being measured.

0°C does not mean “no temperature,” just a point on the scale. So it has no true zeros.

2.2 The Four Scales of Measurement

Scale	Key Features	Notes / Examples
Nominal	- Two or more categories - No intrinsic order	Examples: Gender (Male/Female), Blood type (A/B/AB/O)
Ordinal	- Two or more categories - Categories can be ordered/ranked - Magnitude between values not equal	Examples: Education level (High School < Bachelor < Master), Ratings (Low, Medium, High)
Interval	- Measured on a continuous scale - No absolute zero - Differences between adjacent values are equal - Ratios not defined	Examples: Temperature in Celsius or Fahrenheit
Ratio	- Interval variable with absolute zero - Ratios between values are defined	Examples: Height, Weight, Age, Salary

3. Summary Statistics

used in descriptive statistics, for

summarize a set of observations
communication information simply and quickly

3.1 Measure of Location (Central Tendency)

Tells us where the center or typical value of the data set lies

we use following to as main measurements in central tendency :

Mean -> the average
Median -> the middle value
Mode -> the most frequent value

Example:

This tells us that the data is centered around 6.

3.1.1. Simple Arithmetic Mean

Notation:

Mean of a population =
Sample mean (when sample observations are known)=
Sample mean (when sample observation is unknown)=
Count of Numbers in the population = N
Count of numbers in the sample = n

Given the population values:

Given a sample(values unknown):

Given a sample(values known):

3.1.2 Weighted Arithmetic Mean

Simple arithmetic gives equal importance to all the observations in a data set.

But when relative importance of the items in the distribution is not the same, we need to use this

Given are samples and are the weights corresponding to the observations then,

Exercise:

A student’s final marks in Mathematics, English, Statistics and Computer are 92, 76, 95, and 80 respectively. If the respective credits received for these courses are 2,
1, 3 and 4, determine an appropriate average mark.

3.1.3 Combined/Composite Arithmetic Mean

Simple arithmetic means of two or more related groups can be combined into a composite mean

Given are set of simple arithmetic means of related groups and are the respective sample size then,

Exercise:

There are three branches of a company, employing 100, 20, and 80 persons respectively. If the arithmetic means of the monthly salaries paid by the three
companies are Rs. 50000, 60000, and Rs. 80000 respectively, find the arithmetic mean
of the salaries of all the employees of the three companies.

3.1.4 Standard Geometric Mean ()

Primarily used to determine overall growth rates or averages of percentages over time
This can only be used in positive numbers

Used for a set of positive observations :

(This notation means you multiply all the numbers together and then take the root, where _is the count of numbers in the set)

Exercise:

The economic growth rate of a country for the last three years is 4%, 2%, and 8%. What is the overall growth rate?

Convert percentages to growth factors
Apply the geometric mean formula

Multiply the values

Take the cube root

Convert back to a percentage growth rate

The overall economic growth rate over the three years is approximately 4.6%

3.1.5 Geometric Mean (An Alternative Version)

This version is used when dealing with growth rates that may include negative numbers (as long as they are greater than -1)

If the values are , the alternative geometric mean is:

Exercise:

The economic growth rate of a country for the last three years is 4%, -2%,
and 8%. What is the overall growth rate?
Three yearly growth rates: 4%, -2%, 8%

Convert to growth factors:
Multiply them:
Take the cubic root:
Convert to percentage:

The average yearly growth rate is 3.21%.

3.1.6 Harmonic Mean ()

This is especially useful when dealing with rates or units, for example, calculating average speed.

For a set of observations , the harmonic mean is:

Where:

= number of observations
= each observation

Exercise:

The speeds of three deliveries (in km per hour) of a fast bowler are 60, 90, and 100. What is the average speed?

Write the harmonic mean formula

Find the reciprocals

Add the reciprocals

Divide to find the harmonic mean

Final Answer

3.1.7 Relationship with Other Means

For any set of positive numbers with at least two different values, the following inequality holds:

Where:

= harmonic mean
= geometric mean
= arithmetic mean

This shows that the harmonic mean is always the smallest of the three averages.

Exercise:

Calculate the simple arithmetic mean, geometric mean, and harmonic mean
of 2, 4, 4, and 8.

Conclusion

This confirms that the harmonic mean is the smallest, followed by the geometric mean, and the arithmetic mean is the largest.

3.1.8 Mode (Mo)

The mode is the value of a variable that occurs most frequently in a distribution.

Types of Mode:

Unimodal: Only one mode in the series.
Bimodal: Two modes in the series.
Trimodal: Three modes in the series.
Multimodal: More than three modes in the series.

Exercise:

12 students in a class have the following shoe sizes. What is the mode?

Count the frequency of each value

Shoe Size	Frequency
6	1
7	1
8	5
9	3
10	2

Identify the mode
- The mode is the value with the highest frequency.
- Here, 8 occurs 5 times, which is more than any other value.

3.1.9 Median (Md)

The median is the middle value of a data set after arranging the observations in ascending order.

Let be the total number of observations.

Case 1: When is odd

The median is the value at the

position.

Case 2: When is even

The median lies between the two positions:

The median is the average of the values at these two positions.

Examples:

Example 1: Odd Number of Observations

The 3rd value is:

Example 2: Even Number of Observations

3.2 Quartiles ( , )

Quartiles are values that divide an ordered data set into four equal parts.
Each part contains 25% of the total number of observations.

( ): First quartile (lower quartile)
( ): Second quartile (median)
( ): Third quartile (upper quartile)

Formula for the Location of Quartiles

Let ( ) be the total number of observations.
Location of the quartile

Exercise:

12 students in a class have the following weights (kg):

Find and .

Arrange the data in ascending order

Identify the number of observations

First Quartile

Location

This lies between the 3rd and 4th items.

3rd item ( = 56 )
4th item ( = 59 )

Value of

Second Quartile — Median

Location

This lies between the 6th and 7th items.

6th item ( = 67 )
7th item ( = 68 )

Value of

Third Quartile

Location

This lies between the 9th and 10th items.

9th item ( = 79 )
10th item ( = 81 )

Value of

Interpretation

25% of students weigh less than 56.75 kg
50% of students weigh less than 67.5 kg
75% of students weigh less than 80.5 kg

Advantages and Disadvantages of Mean, Median, and Mode

Measure	Advantages	Disadvantages
Mean	- Can be used with interval and ratio data - Has useful statistical properties (e.g., for variance, standard deviation) - Easy to understand and interpret	- Affected by extreme values (outliers) - May not appear in the actual data - Requires interval/ratio scale data - Works best with symmetric distributions
Median	- Can be used with ordinal data - Unaffected by extreme values - Easy to calculate - Symmetric data not required	- Restricted statistical uses - May not appear in the actual data
Mode	- Can be used with nominal data - Easy to calculate	- Restricted statistical uses - Limited use for analysis

3.3 Grouped Data Distributions

When dealing with large masses of raw data, it is often convenient to classify the data into groups or classes.

**Frequency : The number of individuals or observations in each class.
Grouped data distribution: A table showing classes and their corresponding frequencies.

Example Table – Weights of 600 University Students

Class	Frequency
30 - 39	7
40 - 49	126
50 - 59	278
60 - 69	123
70 - 79	62
80 - 89	4

3.3.1 Class Interval and Class Limits

Class Interval: A symbol defining a class (e.g., 40–49).
Class Limits: The smallest and largest numbers in a class:
- Lower Class Limit (LCL): Smaller number (e.g., 40)
- Upper Class Limit (UCL): Larger number (e.g., 49)

Open Class Interval: A class that has no lower or upper limit indicated.

3.3.2 Class Boundaries / True class limits

If measurements are recorded to the nearest unit:

The interval 40–49 includes all values from 39.5 to 49.5.
- Lower Class Boundary (LCB): 39.5
- Upper Class Boundary (UCB): 49.5

Class	Frequency
30 - 39	7
40 - 49	126
50 - 59	278
60 - 69	123
70 - 79	62
80 - 89	4

Classes with Class Boundaries	Frequency
29.5 - 39.5	7
39.5 - 49.5	126
49.5 - 59.5	278
59.5 - 69.5	123
69.5 - 79.5	62
79.5 - 89.5	4

3.3.3 Size (Width) of a Class Interval ()

Class Width : Difference between UCB and LCB.

If all intervals have the same width, the common width is ( C ).

3.3.4 Class Mark / Midpoint ()

The midpoint of a class interval, used as the average of all observations in the class (assuming even distribution).

Example: For class 40–49:

Summary Table for Example Data

Class	LCL	UCL	LCB	UCB	Width ( C )	Midpoint ( )	Frequency ( )
30–39	30	39	29.5	39.5	10	34.5	7
40–49	40	49	39.5	49.5	10	44.5	126
50–59	50	59	49.5	59.5	10	54.5	278
60–69	60	69	59.5	69.5	10	64.5	123
70–79	70	79	69.5	79.5	10	74.5	62
80–89	80	89	79.5	89.5	10	84.5	4

3.4 Measures of Central Tendency for Grouped Data

For grouped data, we calculate Mean, Mode, Median, Quartiles, Deciles, and Percentiles using class midpoints, class frequencies, and cumulative frequencies.

3.4.1 Mean ( )

Let ( ) be the midpoints of ( ) class intervals,
and ( ) be their frequencies.

The mean is:

Example Table:

Class	Frequency ( )	Midpoint ( )
30–39	7	34.5	241.5
40–49	126	44.5	5607
50–59	278	54.5	15151
60–69	123	64.5	7923.5
70–79	62	74.5	4619
80–89	4	84.5	338

3.4.2 Mode () – Modal Class Method

The mode is the value that occurs most frequently. For grouped data, the modal class has the highest frequency.

Where:

( ) = Lower class boundary of modal class
( ) = Frequency of modal class
( ) = Frequency of class preceding modal class
( ) = Frequency of class succeeding modal class
( ), ( )
( ) = Class width of modal class

Example Table:

Class	Frequency ( )
30–39	7
40–49	126
50–59	278 (modal)
60–69	123
70–79	62
80–89	4

The modal class = 50–59 (highest frequency = 278)
Identify Required Values

Symbol	Value
( )	49.5 (lower boundary of modal class)
( )	278 (frequency of modal class)
( )	126 (frequency of class before modal class)
( )	123 (frequency of class after modal class)
( )	10 (class width)

Use the Modal Class Formula

Substitute the values:

Simplify

3.4.3 Quartiles ( )

Find cumulative frequencies.
Use the formula for the quartile:

Where:

( ) = Lower class boundary of quartile class
( ) = Cumulative frequency before the quartile class
( ) = Frequency of quartile class
( ) = Class width of quartile class
( ) = Total frequency

Example Table:

Class	Frequency ( )
30–39	7
40–49	126
50–59	278
60–69	123
70–79	62
80–89	4

Calculate Cumulative Frequency ()

Class	( )	Cumulative Frequency ()
30–39	7	7
40–49	126	133
50–59	278	411
60–69	123	534
70–79	62	596
80–89	4	600

Calculate ( ) (First Quartile)
- ( )
- Locate quartile class: 50–59 (CF before = 133, f = 278, LCB = 49.5, C = 10)

Calculate ( ) (Median)
- ( )
- Quartile class = 50–59 (CF before = 133, f = 278, LCB = 49.5, C = 10)

Calculate ( ) (Third Quartile)
- ( )
- Quartile class = 60–69 (CF before = 411, f = 123, LCB = 59.5, C = 10)

Quartile	Value (kg)
	50.11
	55.51
	62.67

25% of students weigh less than 50.11 kg
50% of students weigh less than 55.51 kg
75% of students weigh less than 62.67 kg

3.4.4 Deciles ( )

Deciles are measures of position that divide a data set into 10 equal parts.

Each part contains 10% of the total observations
There are 9 deciles: ( )

Meaning of Each Decile

Decile	Interpretation
( )	10% of data lie below this value
( )	20% of data lie below this value
( )	30% of data lie below this value
( )	40% of data lie below this value
( )	50% of data lie below this value (Median)
( )	60% of data lie below this value
( )	70% of data lie below this value
( )	80% of data lie below this value
( )	90% of data lie below this value

Find cumulative frequencies.
Use the formula for the decile:

Where:

( ) = Lower class boundary of decile class
( ) = Cumulative frequency before the decile class
( ) = Frequency of decile class
( ) = Class width of decile class

Example Table:

Class	Frequency ( )	Cumulative Frequency ()
30–39	7	7
40–49	126	133
50–59	278	411
60–69	123	534
70–79	62	596
80–89	4	600

Total number of observations:

Example – Find the 4th Decile ( )

Locate the Decile Position

The value 240 lies in the class 50–59, since:

CF before = 133
CF of class = 411

So, the decile class is 50–59.

Identify Required Values

Symbol	Value
( )	49.5
( )	133
( f )	278
( )	10

Substitute into the Formula

Simplify

3.4.5 Percentiles ()

Percentiles are measures of position that divide a data set into 100 equal parts.

Each part contains 1% of the total observations
There are 99 percentiles: ( )

Meaning of Percentiles

Percentile	Interpretation
( )	10% of data lie below this value
( )	25% of data lie below this value ( )
( )	50% of data lie below this value (Median)
( )	75% of data lie below this value ( )
( )	90% of data lie below this value

Relation to Other Measures

When Percentiles Are Used

To compare individual positions in large datasets
Widely used in exam results, income distributions, and rankings
Useful when data size is large and grouped

Find cumulative frequencies.
Use the formula for the percentile:

Where:

( ) = Lower class boundary of percentile class
( ) = Cumulative frequency before the percentile class
( ) = Frequency of percentile class
( ) = Class width of percentile class

Example Table:

Class (kg)	Frequency ( )	Cumulative Frequency ()
30–39	7	7
40–49	126	133
50–59	278	411
60–69	123	534
70–79	62	596
80–89	4	600

Total number of observations:

Example: Find the 75th Percentile ( )

Locate the Percentile Position

The value 450 lies in the class 60–69, since:

CF before = 411
CF of class = 534

So, the percentile class is 60–69.

Identify Required Values

Symbol	Value
( )	59.5
( )	411
( )	123
( )	10

Substitute into the Formula

Simplify

3.5 Measures of Dispersion

Measures of dispersion describe how spread out the data values are around a central value.
They help us understand variability, consistency, and stability of data.

Common Measures of Dispersion

3.5.1 Range

The range is the simplest measure of dispersion.
It shows the difference between the largest and smallest values.

Example:

Data:

3.5.2 Interquartile Range (IQR)

The interquartile range measures the spread of the middle 50% of data.

Example:

Given:

3.5.3 Semi-Interquartile Range (SIQR)

The semi-interquartile range is half of the interquartile range.

Example:

3.5.4 Mean Deviation (MD)

Mean deviation measures the average deviation of values from the mean.

⚠️ Positive and negative deviations may cancel out.

Example:

Given Data

Find the Mean

Find Deviations from the Mean

()	()
2	(2 - 4 = -2)
4	(4 - 4 = 0)
6	(6 - 4 = +2)

Apply the Mean Deviation Formula

⚠️ Important Observation

Although the data values are spread out, the mean deviation is zero because:

Positive deviations cancel negative deviations
This is why mean deviation is rarely used in practice

To avoid this cancellation problem, we use:

Mean Absolute Deviation (MAD)
Variance
Standard Deviation

3.5.5 Mean Absolute Deviation (MAD)

Mean absolute deviation removes the sign by using absolute values.

Example:

Data:

Mean:

3.5.6 Median Absolute Deviation

This measure calculates deviation from the median.

Used when data contains outliers.

Example:

data:

Arrange the Data

Find the Median

The middle value is:

Find Absolute Deviations from the Median

	Deviation from Median
2
4
5
6
100

Apply the Formula

3.5.7 Variance

Variance measures how far data values spread around the mean.

Population Variance

Let ( ) be a population.

Alternative form:

Sample Variance

For sample data:

When values are unknown

When values are known

Alternative form:

3.5.8 Standard Deviation

The standard deviation is the square root of variance.
It has the same unit as the original data.
It describes how far data values typically spread from the mean.
Sensitive to extreme values (outliers)

Population Standard Deviation

Sample Standard Deviation

When values are unknown

When values are known

Example:

If:

Then:

Interpretation

Small standard deviation → data values are close to the mean
Large standard deviation → data values are widely spread

3.5.9 Coefficient of Variation (CV)

The coefficient of variation compares relative variability between datasets.

Interpretation

Higher CV → More variability, less consistency , less stability
Lower CV → Less variability, more consistency , more stability

Example:

3.5.10 Variance for Grouped Data

For grouped data with class midpoints ( ) and frequencies ( ):

Where:

Example :

Class Interval
10–19	5	14.5
20–29	8	24.5
30–39	7	34.5

Total Frequency

Mean of Grouped Data


5	14.5	72.5
8	24.5	196
7	34.5	241.5

Compute


14.5	( -11 )	121
24.5	( -1 )	1
34.5	( 9 )	81

Multiply by Frequencies


5	121	605
8	1	8
7	81	567

Variance